Dynamics of Boltzmann Q learning in two-player two-action games.

نویسندگان

  • Ardeshir Kianercy
  • Aram Galstyan
چکیده

We consider the dynamics of Q learning in two-player two-action games with a Boltzmann exploration mechanism. For any nonzero exploration rate the dynamics is dissipative, which guarantees that agent strategies converge to rest points that are generally different from the game's Nash equlibria (NEs). We provide a comprehensive characterization of the rest point structure for different games and examine the sensitivity of this structure with respect to the noise due to exploration. Our results indicate that for a class of games with multiple NEs the asymptotic behavior of learning dynamics can undergo drastic changes at critical exploration rates. Furthermore, we demonstrate that, for certain games with a single NE, it is possible to have additional rest points (not corresponding to any NE) that persist for a finite range of the exploration rates and disappear when the exploration rates of both players tend to zero.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamics of Softmax Q-Learning in Two-Player Two-Action Games

We consider the dynamics of Q–learning in two–player two–action games with Boltzmann exploration mechanism. For any non–zero exploration rate the dynamics is dissipative, which guarantees that agent strategies converge to rest points that are generally different from the game’s Nash Equlibria (NE). We provide a comprehensive characterization of the rest point structure for different games, and ...

متن کامل

Q-learning in Two-Player Two-Action Games

Q-learning is a simple, powerful algorithm for behavior learning. It was derived in the context of single agent decision making in Markov decision process environments, but its applicability is much broader— in experiments in multiagent environments, Q-learning has also performed well. Our preliminary analysis finds that Q-learning’s indirect control of behavior via estimates of value contribut...

متن کامل

Reinforcement Learning in Multi-agent Games

This article investigates the performance of independent reinforcement learners in multiagent games. Convergence to Nash equilibria and parameter settings for desired learning behavior are discussed for Q-learning, Frequency Maximum Q value (FMQ) learning and lenient Q-learning. FMQ and lenient Q-learning are shown to outperform regular Q-learning significantly in the context of coordination ga...

متن کامل

Convergent Multiple-timescales Reinforcement Learning Algorithms in Normal Form Games

We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a modelfree algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely...

متن کامل

Convergent Multiple-times-scales Reinforcement Learning Algorithms in Normal Form Games

We consider reinforcement learning algorithms in normal form games. Using two-time-scales stochastic approximation, we introduce a modelfree algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Physical review. E, Statistical, nonlinear, and soft matter physics

دوره 85 4 Pt 1  شماره 

صفحات  -

تاریخ انتشار 2012